Audio Information Transforming Method, Video/Audio Format, 
Encoder, Audio Information Transforming Program, and Audio 
Information Transforming Device 

5 Background of the Invention 

Field of the Invention 

The present invention relates to an audio 

information transforming method, a video/audio format, > an. 

encoder, an audio information transforming program, and an 
10 audio information transforming device, which are employed 

in a video/audio format like MPEG (Moving Picture Experts*. 

Group) 4 having video information and audio information 

every object, or a video/audio format like DVD (Digital 

Versatile Disk) having video information and audio 
is information every scene. 

Description of the Related Art 

In recent years, the video streaming based on the 
DVD or the broadband is being prosperously carried out, and 

20 thus a chance to handle the video/audio format in the home 
is increased. In particular, since the DVD is spread and 
the audio apparatuses such as the AV amplifier, etc, become 
inexpensive, the persons who enjoy the audio in the 
multiple channels are increased. In the DVD, MPEG 2 is 

25 used as the video recording system and Dolby digital (AO 
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3), DTS (Digital Theater System), linear PCM (Pulse Code 
Modulation) , MPEG audio, or the like is used as the audio 
recording system. Eight audio streams can be installed 
into the DVD disk. Thus, if a different sound is loaded on 
5 each audio stream respectively, various applications such 
as dubbing of plural languages, high sound quality playing, 
commentary, sound track, etc. can be implemented. 

Meanwhile, as one of the next generation 

.. video/audio . formats./. ..there is MPEG . 4 In . the . MPEG 4, . ..the 

10 object having video/audio information constituting the 
scenes that are replayed on the screen is observed with 
interest, and the motion picture compression can be 
effectively attained by coding the motion picture every 
obj ect . 

is Also, out of the technologies of the motion picture 

recognizing processing, the technology of correcting the 
Doppler effect of the sound, which is emitted from the 
moving object in the image, is set forth in Patent 
Literature 1, for example. 

20 [Patent Literature 1] 

JP-A-5-174147 (see Paragraph 0013, etc.) 
However, in the multi-channel (e.g., 5.1-channel, 
etc.) audio system for playing the DVD in the prior art, it 
is impossible to change the listening point obtained by one 

25 audio stream. Therefore, the listener can get the hearing 
feeling only at the listening point at which the listener 
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himself or herself listens to the audio. 

In addition, it is desired that the Doppler effect 
caused by the movement of the object should be adjusted in 
response to change of the listening point. 
5 The present invention has been made in view of the 

above circumstances, and it is an object of the present 
invention to provide an audio information transforming 
method, a video/audio format, an encoder, an audio 
information transforming program, .and an audio . information 

10 transforming device, which are capable of changing a 
listening point freely only by one audio stream to thereby 
produce the audio environment that enables the listener to 
feel that such listener is just in the video, and also 
adjusting the Doppler effect, which is caused by the 

is movement of the object, in response to change of the 
listening point. 



Summary of the Invention 

In order to attain the above object, an audio 
20 information transforming method set forth in Claim 1 
applied to a video/audio format in which a screen includes 
a plurality of objects and each object has video 
information, position information, and audio information, 
comprises a virtual listening point setting step of setting 
25 a virtual listening point at a position different from a 
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basic listening point that is set as a position at which a 
listener listens to an audio; a relative velocity 
calculating step of calculating a relative velocity between 
the virtual listening point and the object; and an audio 
5 frequency transforming step of executing an audio frequency 
transformation based on the relative velocity to add a 
Doppler effect to the audio information at the virtual 
listening point. 

According, to- such -method, with respect -to the- 

10 object having the video/audio information constituting the 
scene that is replayed on the screen in the video/audio 
format such as MPEG 4, for example, the Doppler effect can 
be added to the audio information at the virtual listening 
point such that, for example, the frequency of the sound is 

15 increased if the object approaches the virtual listening 
point or the frequency of the sound is decreased if the 
object leaves the virtual listening point. Therefore, the 
audio environment with the strong appeal/ reality, which 
enables the listener to feel that such listener just enters 

20 into the video (the virtual listening point), can be 
produced. 

Also, in the audio information transforming method 
set forth in Claim 2, the relative velocity calculating 
step calculates the relative velocity between the virtual 
25 listening point and the object by calculating velocity 
information of the object based on position information of 
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the object before and after a predetermined time has 
lapsed. 

According to such method, the Doppler effect is 
added to the audio information at the virtual listening 
5 point by calculating the velocity information of the object 
based on the position information of the object before and 
after the predetermined time has lapsed and then 
calculating the relative velocity between the virtual 
listening point- and -the object*. Therefore,, the Doppler 

10 effect caused by the movement of the object can be 
calculated/processed easily by using the coded position 
information of the object. As a result, the audio 
environment with the appeal/reality, . which enables the 
listener to grasp such a situation that the object in the 

is screen is moving from the virtual listening point by the 
audio, can be produced. 

Also, in the audio information transforming method 
set forth in Claim 3, the relative velocity calculating 
step calculates the relative velocity by extracting 

20 velocity information of the object and then comparing the 
position information and the velocity information of the 
object and position information of the virtual listening 
point. 

According to such method, the relative velocity is 
25 calculated by extracting velocity information of the object 
and then comparing the position information and the 
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velocity information of the object and position information 
of the virtual listening point. Therefore, there is no 
necessity to calculate the velocity of the object by the 
operation, and the burden of the calculating process can be 
reduced correspondingly, and in addition the processing 
speed can be improved. 

Also, in the audio information transforming method 
set forth in Claim 4, the relative velocity calculating 
step calculates the relative velocity between the . virtual 
listening point and the object by calculating velocity 
information of the virtual listening point based on 
position information of the virtual listening point before 
and after a predetermined time has lapsed. 

According to such method, the Doppler effect is 
added to the audio information at the virtual listening 
point by calculating the velocity information of the 
virtual listening point based on position information of 
the virtual listening point before and after the 
predetermined time has lapsed and then calculating the 
relative velocity between the virtual listening point and 
the object. Therefore, the Doppler effect caused by the 
movement of the virtual listening point can be 
calculated/processed easily by using the position 
information of the virtual listening point. As a result, 
the audio environment with the appeal/reality, which 
enables the listener to grasp such a situation that the 



listener himself or herself (positioned at the virtual 
listening point) is moving by the audio, can be produced. 

In the audio information transforming method set 
forth in Claim 5, the relative velocity calculating step 
5 calculates the relative velocity by extracting velocity 
information of the virtual listening point and then 
comparing position information and the velocity information 
of the virtual listening point and the position information 
of the object. . . . 

10 According to such method, the relative velocity is 

calculated by extracting velocity information of the 
virtual listening point and then comparing the position 
information and the velocity information of the virtual 
listening point and the position information of the object. 

15 Therefore, there is no necessity to calculate the velocity 
of the virtual listening point by the operation, and the 
burden of the calculating process can be reduced 
correspondingly, and in addition the processing speed can 
be improved. 

20 Also, an audio information transforming method set 

forth in Claim 6 applied to a video/audio format in which 
each scene that is replayed on a screen has video 
information and audio information, and the scene has 
velocity information and direction information based on 

25 which a background is moved, comprises a virtual listening 
point setting step of setting a virtual listening point at 
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a position different from a basic listening point that is 
set as a position at which a listener listens to an audio; 
a relative velocity calculating step of calculating a 
relative velocity between the virtual listening point and a 
5 background based . on the velocity information and the 
direction information of the background; and an audio 
frequency transforming step of transforming an audio 
frequency based on the relative velocity to add a Doppler 
effect to the audio information at the virtual . listening 
10 point. 

According to such method, with respect to .the scene 
that is replayed on the screen in the video/audio format 
such as DVD, for example, the Doppler effect is added to 
the audio information at the virtual listening point in 

15 response to the moving speed of the background. Therefore, 
the audio environment with the strong appeal/reality, which 
enables the listener to feel that such listener just enters 
into the video (the virtual listening point) and to grasp 
such a situation that the background of the screen' is 

20 moving from the virtual listening point by the audio, can 
be produced. 

An audio information transforming method set forth 
in Claim 7, when the audio information including the 
Doppler effect previously is included in the object, the 
25 audio frequency transforming step executes an audio 
frequency transformation to cancel the Doppler effect 
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included in the audio information of the object, and 
executes the audio frequency transformation based on the 
relative velocity to add the Doppler effect to the audio 
information of the virtual listening point. 
5 According to such method, in the case that the 

audio information including the Doppler effect previously 
is included in the object, first such Doppler effect 
included in the audio information is canceled, and then the 
. . Doppler effect is -added to. the audio- information at the. 

10 virtual listening point. Therefore, even if the Doppler 
effect is included in the audio information prior to the 
transformation, the Doppler effect caused when the object 
in the screen moves from the virtual listening point can be 
expressed precisely. 

15 . In the audio information transforming method set 

forth in Claim 8, audio information transformation at a 
time of final image unit is executed by adding the Doppler 
effect to the audio information at the virtual listening 
point by using a formula by which the audio frequency 

20 transformation of the audio information at the virtual 
listening point prior to the final image by one image unit 
is executed. 

According to such method, in the case that the 
position information of the succeeding screen cannot be 
25 obtained at the time of the final image of the title that 
is now being replayed, for example, the audio frequency of 
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the object, which is heard at the virtual listening point, 
can be calculated by using the formula of the audio 
frequency transformation that is obtained in audio 
frequency transformation processing in the preceding image 
5 of the final image. Therefore, such a possibility can be 
eliminated that the audio frequency transformation cannot 
be executed in the final image of the title, or the like 
because of lack of information. 

In. the. audio, information ' transforming method set 

10 forth in Claim 9, the video/audio format includes reduced 
scale information of the screen every scene. 

According to such method, when the reduced scale of 
the screen is changed by zoom-in, zoom-out, or the like of 
the replayed screen, the audio information transformation 

is set forth in Claims 1 to 8 can be executed precisely. 

A video/audio format set forth in Claim 10 that 
includes velocity information of the object, or velocity 
information and direction information of the scene, or 
reduced scale information of the screen every scene, which 

20 are employed in the audio information transforming method 
set forth in any one of Claims 1 to 9 . 

An encoder set forth in Claim 11 that encodes 
velocity information of the object, or velocity information 
and direction information of the scene, or reduced scale 

25 information of the screen every scene, which are employed 
in the audio information transforming method set forth in 
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any one of Claims 1 to 9 . 

According to such encoder, the velocity information 
of the object, the velocity information and the direction 
information of the scene, and the reduced scale information 
5 of the screen every scene are encoded, and then these 
information are included in the video/audio format. 
Therefore, the audio information transformation set forth 
in any one of Claims 1 to 9 can be implemented. 

In order., to attain the above object,, an. audio.. 

10 information transforming program set forth in Claim 12 
causes a computer to execute, a procedure of setting a 
virtual listening point at a position different from a 
basic listening point that is set as a position at which a 
listener listens to an audio; a procedure of calculating a 

15 relative velocity between the virtual listening point and 
the object; and a procedure of executing an audio frequency 
transformation based on the relative velocity to add a 
Doppler effect to the audio information at the virtual 
listening point. 

20 According to such program, with respect to the 

object having the video/audio information constituting the 
scene that is replayed on the screen in the video/audio 
format such as MPEG 4, for example, the Doppler effect can 
be added to the audio information at the virtual listening 

25 point such that, for example, the frequency of the sound is 
increased if the object approaches the virtual listening 
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point or the frequency of the sound is decreased if the 
object leaves the virtual listening point. Therefore, if 
the recording medium (the memory such as ROM, or the like) 
in which this program is recorded is employed, the 
video/audio player (DVD player, LD player, game, MPEG 
player, system in the movie theater, etc.) that can produce 
the audio environment with the appeal/reality, which 
permits the listener to feel that such* listener just enters 
into the -video . (the virtual listening .point)-,- can. be 
implemented. 

In the audio information transforming program set 
forth in Claim 13, the procedure of calculating the 
relative velocity includes a procedure of calculating 
velocity information of the' object based on position 
information of the object before and after a predetermined 
time has lapsed. 

According to such program, since the procedure of 
calculating the relative velocity calculates the velocity 
information of the object based on position information of 
the object before and after the predetermined time has 
lapsed, the Doppler effect caused by the movement of the 
object can be calculated/processed easily by using the 
coded position information of the object. Therefore, if 
the recording medium (the memory such as ROM, or the like) 
in which this program is recorded is employed, the 
video/audio player (DVD player, LD player, game, MPEG 
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player, system in the movie theater, etc.) that can produce 
the audio environment with the appeal/reality, which 
enables the listener to grasp such a situation that the 
object in the screen is moving from the virtual listening 
5 point by the audio, can be implemented. 

In the audio information transforming program set 
forth in Claim 14, the procedure of calculating the 
relative velocity includes a procedure of extracting 
-velocity information of the object and then comparing the 
10 position information and the velocity information of the 
object and position information of the virtual listening 
point. 

According to such program, since the procedure of 
calculating the relative velocity extracts velocity 

is information of the object and then compares the position 
information and the velocity information of the object and 
the position information of the virtual listening point, 
there is no necessity to calculate the velocity of the 
obj ect by the operation, and the burden of the calculating 

20 process can be reduced correspondingly, and in addition the 
processing speed can be improved. Therefore, if the 
recording medium (the memory such as ROM, or the like) in 
which this program is recorded is employed, the video/audio 
player (DVD player, LD player, game, MPEG player, system in 

25 the movie theater, etc.) that can produce the audio 
environment with the appeal/reality, which enables the 
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listener to grasp such a situation that the object in the 
screen is moving from the virtual listening point by the 
audio, can be implemented. 

In the audio information transforming program set 
5 forth in Claim 15, the procedure of calculating the 
relative velocity includes a procedure of calculating 
velocity information of the virtual listening point based 
on position information of the virtual listening point 
before and after a -predetermined time. has lapsed. . . 

10 According to such program, since the velocity 

information of the virtual listening point is calculated 
based on the position information of the virtual listening 
point before and after the predetermined time has lapsed, 
the Doppler effect caused by the movement of the virtual 

is listening point can be calculated/processed easily by using 
the position information of the virtual listening point. 
Therefore, if the recording medium (the memory such as ROM, 
or the like) in which this program is recorded is employed, 
the video/audio player (DVD player, LD player, game, MPEG 

20 player, system in the movie theater, etc.) that can produce 
the audio environment with the appeal/reality, which 
enables the listener to grasp such a situation that the 
listener himself or herself (positioned at the virtual 
listening point) is moving by the audio, can be 

25 implemented . 

In the audio information transforming program set 
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forth in Claim 16, the procedure of calculating the 
relative velocity includes a procedure of calculating the 
relative velocity by extracting velocity information of the 
virtual listening point and then comparing position 
5 information and the velocity information of the virtual 
listening point and the position information of the object. 

According to such program, the relative velocity is 
calculated by extracting the velocity information of the 
virtual listening point and then comparing the. position. 

10 information and the velocity information of the virtual 
listening point and the position information of the object. 

Therefore, there is no necessity to calculate the velocity 
of the .virtual listening point by the operation, and the 
burden of the calculating process can be reduced 

is correspondingly, and in addition the processing speed can 
be improved. As a result, if the recording medium (the 
memory such as ROM, or the like) in which this program is 
recorded is employed, the video/audio player (DVD player, 
LD player, game, MPEG player, system in the movie theater, 

20 etc.) that can produce the audio environment with the 
appeal/reality, which enables the listener to grasp such a 
situation that the listener himself or herself is moving by 
the audio, can be implemented. 

An audio information transforming program set forth 

25 in Claim 17 causes a computer to execute, a procedure of 
setting a virtual listening point at a position different 
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from a basic listening point that is set as a position at 
which a listener listens to an audio; a procedure of 
calculating a relative velocity between the virtual 
listening point and a background according to a velocity 
5 and a direction based on which the background of a scene is 
moved; and a procedure of executing an audio frequency 
transformation based on the relative velocity to add a 
Doppler effect to the audio information at the virtual 
listening point. ... . 

10 According to such program, with respect to the 

scene that is replayed on the screen in the video/audio 
format such as DVD, for example, the Doppler effect is 
added to the audio information at the virtual listening 
point in response to the moving" speed of the background. 

is Therefore, if the recording medium (the memory such as ROM, 
or the like) in which this program is recorded is employed, 
the video/audio player (DVD player, LD player, game, MPEG 
player, system in the movie theater, etc.), which can 
produce the audio environment with the strong 

20 appeal/reality, can be implemented. 

In the audio information transforming program set 
forth in Claim 18, when the audio information including the 
Doppler effect previously is included in the object, the 
procedure of executing an audio frequency transformation 

25 includes a procedure of executing an audio frequency 
transformation to cancel the Doppler effect included in the 
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audio information of the object, and executing the audio 
frequency transformation based on the relative velocity to 
add the Doppler effect to the audio information of the 
virtual listening point, 
s According to such program, in the case that the 

audio information including the Doppler effect previously 
is included in the object, first such Doppler effect 
included in the audio information is canceled, and then the 
Doppler effect is. added to the audio, -information * at the. 

10 virtual listening point. Therefore, even if the Doppler 
effect is included in the audio information prior to the 
transformation, the Doppler effect caused when the object 
in the screen moves from the virtual listening point can be 
expressed precisely. As a result, if the recording medium 

15 (the memory such as ROM, or the like) in which this program 
is recorded is employed, the video/audio player (DVD 
player, LD player, game, MPEG player, system in the movie 
theater, etc.), which can produce the audio environment 
with the strong appeal/reality, can be implemented. 

20 In the audio information transforming program set 

forth in Claim 19, when audio information transformation at 
a time of final image unit is executed, a procedure of 
adding the Doppler effect to the audio information at the 
virtual listening point by using a formula, by which the 

25 audio frequency transformation of the audio information at 
the virtual listening point prior to the final image by one 
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image unit is executed, is included. 

According to such program, in the case that the 
position information of the succeeding screen cannot be 
obtained at the time of the final image of the title that 
5 is now being replayed, for example, the audio frequency of 
the object, which is heard at the virtual listening point, 
can be calculated by using the formula of the audio 
frequency transformation that is obtained in audio 
frequency, transformation processing in . the preceding, image. 

10 of the final image. Therefore, such a possibility can be 
eliminated that the audio frequency transformation cannot 
be executed in the final image of the title, or the like 
because of lack of. information. As a result, if the 
recording medium (the memory such as ROM, or the like) in 

15 which this program is recorded is employed, the video/audio 
player (DVD player, LD player, game, MPEG player, system in 
the movie theater, etc.), which can produce the audio 
environment with the strong appeal/reality, can be 
implemented. 

20 In the audio information transforming program set 

forth in Claim 20, the video/audio format includes reduced 
scale information of the screen every scene. 

According to such program, when the reduced scale 
of the screen is changed by zoom-in, zoom-out, or the like 

25 of the replayed screen, the audio information 
transformation can be executed precisely. Therefore, if 
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the recording medium (the memory such as ROM, or the like) 
in which this program is recorded is employed, the 
video/audio player (DVD player, LD player, game, MPEG 
player, system in the movie theater, etc.), which can 
5 produce the audio environment with the strong 
appeal /reality, can be implemented . 

In order to attain the above object, an audio 
information transforming device set forth in Claim 21 for a 
video/audio format in. which a screen includes a plurality. 

10 of objects and each object has video information, position 
information, and audio information, comprises a virtual 
listening point setting section for setting a virtual 
listening point at a position different from a basic 
listening point that is set as a position at which a 

is listener listens to an audio; a relative velocity 
calculating section for calculating a relative velocity 
between the virtual listening point and the object; and an 
audio frequency transforming section for executing an audio 
frequency transformation based on the relative velocity to 

20 add a Doppler effect to the audio information at the 
virtual listening point. 

According to such device, with respect to the 
object having the video/audio information constituting the 
scene that is replayed on the screen in the video/audio 

25 format such as MPEG 4, for example, the Doppler effect can 
be added to the audio information at the virtual listening 
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point such that, for example, the frequency of the sound is 
increased if the object approaches the virtual listening 
point or the frequency of the sound is decreased if the 
object leaves the virtual listening point. Therefore, if 
5 this audio information transforming device is employed, the 
audio environment with the strong appeal/reality, which 
enables the listener to feel that such listener just enters 
into the video (the virtual listening point), can be 
. produced. 

10 In the audio information transforming device set 

forth in Claim 22, the relative velocity calculating 
section calculates the relative velocity by comparing 
position information of the virtual listening point and the 
position information of the object and the position 

is information of the virtual listening point and the position 
information of the object after a predetermined time has 
lapsed. 

According to such device, the audio environment 
with the appeal/reality, which enables the listener to feel 

20 that such listener just enters into the video (the virtual 
listening point) and to grasp such a situation that the 
object in the screen is moving from the virtual listening 
point by the audio or to grasp such a situation that the 
listener himself or herself is moving by the audio, can be 

25 produced. 

In the audio information transforming device set 
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forth in Claim 23, the relative velocity calculating 
section calculates the relative velocity by comparing the 
position information and velocity information of the object 
and the position information of the virtual listening 
5 point. 

According to such device, the audio environment 
with the appeal/reality, which enables the listener to feel 
that such listener just enters into the video (the virtual 
listening point) and to grasp such .a situation that the. 

10 object in the screen is moving from the virtual listening 
point by the audio, can be produced. 

In the audio information transforming device set 
forth in Claim 24, the relative velocity calculating 
section calculates the relative velocity by comparing the 

is position information of the object and the position 
information and velocity information of the virtual 
listening point. 

According to such device, the audio environment 
with the appeal/reality, which enables the listener to feel 

20 that such listener just enters into the video (the virtual 
listening point) and to grasp, such a situation that the 
listener himself or herself (positioned at the virtual 
listening point) is moving by the audio, can be produced. 

An audio information transforming device set forth 

25 in Claim 25 for a video/audio format in which each scene 
that is replayed on a screen has video information and 
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audio information, and the scene has velocity information 
and direction information based on which a background is 
moved, comprises a virtual listening point setting section 
for setting a virtual listening point at a position 
5 different from a basic listening point that is set as a 
position at which a listener listens to an audio; a 
relative velocity calculating section for calculating a 
relative velocity between the virtual listening point and 
the background - based . on .the velocity . .information. . and . the 

10 direction information of the background; and an audio 
frequency transforming section for executing an audio 
frequency transformation based on the relative velocity to 
add a Doppler effect to the audio information at the 
virtual listening point. 

is According to such device, with respect to the scene 

that is replayed on the screen in the video/audio format 
such as DVD, for example, the Doppler effect is added to 
the audio information at the virtual listening point in 
response to the moving speed of the background. Therefore, 

20 the audio environment with the appeal/reality, which 
enables the listener to feel that such listener just enters 
into the video (the virtual listening point) and to grasp 
such a situation that the background of the screen is 
moving from the virtual listening point by the audio, can 

25 be produced. 
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Brief Description of the Drawings 

Fig.l is a view explaining an audio information 
transforming method according to a first embodiment of the 
present invention; 
5 Fig. 2 is a view explaining the audio information 

transforming method according to the first embodiment of 
the present invention; 

Fig. 3 .is -a view explaining an audio information . - 
transforming method according to a second embodiment of the 
10 present invention, and an image view of a scene describing 
format; 

Fig. 4 is a view explaining the. audio information 
transforming method according to the second embodiment of 
the present invention, and a view showing an example of a 
is video/audio format; 

Fig. 5 is a view explaining an audio information 
transforming method according to a third embodiment of the 
present invention; 

Fig. 6 is a view explaining an audio information 
20 transforming method according to a fourth embodiment of the 
present invention; 

Fig. 7 is a view explaining an audio information 
transforming method according to a sixth embodiment of the 
present invention; 
25 Fig. 8 is a view explaining the audio information 
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transforming method according to the sixth embodiment of 
the present invention; 

Fig. 9 is a view explaining the audio information 
transforming method according to the sixth embodiment of 
5 the present invention; 

Fig. 10 is a view explaining the audio information 
transforming method according to the sixth embodiment of 
the present invention, and a view showing an example of a 

video/audio format.; 

io Fig. 11 is a view explaining an audio information 

transforming method according to an eighth embodiment of 
the present invention; 

Fig. 12 is a view explaining the audio information 
transforming method according to the eighth embodiment of 
15 the present invention; 

Fig. 13 is a view explaining an audio information 
transforming method according to a ninth embodiment of the 
present invention; 

Fig. 14 is a view explaining the audio information 
20 transforming method according to a tenth embodiment of the 
present invention, and a view showing an example of a 
video/audio format; and 

Fig. 15 is a block diagram showing an example of an 
Audio information transforming System of this invention. 

25 

In the drawings, the reference numeral 1, 2, 3, 
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each refers to an object; 100, 801 to a screen; 101, 102, 
701, 1002 to a virtual listening point; 1001 to a basic 
listening point; 1201 to a time axis; 1500 to an audio 
information transforming device; 1510 to a video/audio 
5 format; 1520 to a virtual listening point setting section; 
.1530 relative velocity calculating section; and 1540 to an 
audio frequency transforming section. 



Detailed Description of the Preferred Embodiments 

10 Embodiments of the present invention will be 

explained in detail with reference to the drawings 
hereinafter . 
(First Embodiment) 

FIG.l is a view explaining a first embodiment of 

15 the present invention. 

In FIG.l, a virtual listening point 101 is decided 
in a screen 100. Also, assume that a video object 1 having 
audio information is moving from the left to the right of 
the screen 100. Then, if coordinates of the virtual 

20 listening point 101 are set to (xl, yl, zl), a current 
position of the object 1 is set to PI (xa, ya, za) in 
FIG. 2, and a position after a time t has lapsed is set to 
P2 (xb, yb, zb) in FIG. 2, a vector between them is given by 
Equation (1) . 

25 [Formula 1] 
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P\P2 = (xb - xa , yb - ya , zb - za) • - • ( 1 ) 

A velocity of the object 1 is calculated to take 
account of unit of time. In this case, if a velocity of 
the object 1 is set to VI, this velocity is given by 
5 Equation (2) . 
[Formula 2] 

V\ = k\xb - xa 7 yb - ya , zb - za) ...(2) 
where k is a constant. 

Then, a cos 6 is calculated by using, an angle 6 . 

10 between a vector directed from the position PI to the 
virtual listening point 101 and a vector directed from the 
position PI to the position P2, as shown in FIG. 2. Then, a 
component of the velocity VI of the object 1 in the 
direction directed from the position PI to the virtual 

is listening point 101 can be represented by Equation (3) . 
[Formula 3] 

V\'= Flcos G ... (3) 

Here, assume that a velocity of the sound is v, an 
audio frequency of a sound source is f, and an audio 

20 frequency of the sound heard at the virtual listening point 
101 is fl, this audio frequency fl can be represented by 
Equation (4) . 
[Formula 4] 

25 As can be seen from Equation (4), even though the 

virtual listening point 101 is set at any place, the 
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listener can enjoy the audio with stronger reality by 
changing the audio frequency of the audio information that 
is heard at the virtual listening point 101. 

As described above, in the present embodiment, the 
5 virtual listening point 101 is decided at a position 
different from the basic listening point that is set as a 
position at which the listener listens to the audio, then a 
relative velocity between the virtual listening point 101 
and the .. .object . . 1 . is. calculated based- on ..position 

10 information of the virtual listening point 101 and position 
information of the object 1, and then the audio frequency 
at the virtual listening point 101 is changed according to 
the calculated relative velocity. Therefore, the sound 
field with the reality can be generated by moving freely 

is the virtual listening point 101 at which the listener can 
exist virtually . 

(Second Embodiment) 

FIG. 3 is a view explaining a second embodiment of 
20 the present invention . 

In the above first embodiment, the velocity of the 
object 1 is calculated based on the coordinate information, 
and the audio frequency of the audio that is heard at the 
virtual listening point 101 is changed on the basis of the 
25 information. However, if the object 1 includes velocity 
information previously in time unit, such calculation is 
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not needed. In the present embodiment, if the video/audio 
format has the velocity information that is encoded 
previously by an encoder, or the like, such velocity 
information is extracted and then the audio frequency of 
5 the audio that is heard at the virtual listening point is 
calculated based on such information. 

In the video/audio format described by a format 
shown in FIG. 3, velocity information of the objects 1, 
2, . . . n are- obtained. . Like the first embodiment, if . the 
10 velocity of the object 1 is set to VI, a velocity component 
VI' directed from the object 1 to the virtual listening 
point 101 can be represented, as shown in Equation (5), by 
using the angle 0 shown in FIG. 2. 
[Formula 5] 

is vr = Ficos e ... (5) 

Here, assume that the velocity of the sound is v, 
the audio frequency of the sound from the sound source is 
f, and the audio frequency of the sound heard at the 
virtual listening point 101 is fl, this audio frequency fl 
20 can be represented by Equation (6). 
[Formula 6] 

fl = v^w f ••• (6) 

In Equation (6), if the audio frequency of the 
audio information that is heard at the virtual listening 
25 point 101 is changed, the listener can enjoy the audio with 
the reality even though the virtual listening point 101 is 
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set at any place. 

Meanwhile, in order to implement the present embodiment, 
the velocity information and the direction information of 
the object 1 must be described in the object information. 
5 For example, as shown in FIG. 4, the velocity information 
and the direction information are included in the 
information at a certain time out of the object 1 
information, generation of the audio with regard to the 
Doppler effect can.be. realized by using, these, information. 

10 In this fashion, according to the present 

embodiment, the virtual listening point 101 is decided at a 
position different from the basic position at which the 
listener listens to the sound of the object 1, then an 
approaching or leaving velocity of the object 1 that is 

is observed at the virtual listening point 101 is calculated 
based on the velocity information and the moving direction 
information of the object 1 and the position information of 
the virtual listening point 101, and then the audio 
frequency of the audio that is heard at the virtual 

20 listening point 101 is changed according to the calculated 
velocity. Therefore, it is possible to provide the 
stronger appeal and reality than the first embodiment to 
the audio that is heard at the virtual listening point 101. 
According to the obtained relative velocity, the audio 

25 frequency transforming section changes the audio frequency 
information of the virtual listening point 101. 
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(Third Embodiment) > 

FIG. 5 is a view explaining a third embodiment of 
the present invention. 
5 In FIG. 5, assume that a virtual listening point 102 

is moved rightward in the screen. Also, assume that a 
video object 2 having the audio information is not moved. 
Then, if coordinates of the object 2 are set to (xl, yl, 
zl) shown. in> FIG. 5, . a .current position of the virtual 
10 listening point 102 is set to PI (xa, ya, za) in FIG. 5, and 
a position after the time t has lapsed is set to P2 (xb, 
yb, zb) , a vector between them can be represented by 
Equation (7) . 
[Formula 7] 

15 PIP 2 = (xb - xa , yb - ya , zb - za ) ...(7) 

A velocity of the virtual listening point 102 is calculated 
with regard to unit of time. If the velocity of the 
virtual listening point 102 is set to VI, this velocity VI 
can be represented by Equation (8). 

20 [Formula 8] 

VI = k(xb - xa , yb - ya , zb - za) . . . ( 8 ) 

where k is a constant. 

Then, the cos 6 is calculated by using the angle 8 between 
a vector directed from the object 2 to the position PI and 
25 a vector directed from the position PI to the position P2, 
as shown in FIG. 5. Then, a component VI' of the velocity 
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VI of the virtual listening point 102 in the direction 

directed from the object 2 to the position PI can be 

represented by Equation (9) . 

[Formula 9] 

5 VY = ^lcos 9 ... (9) 

Here, assume that the velocity of the sound is v, 

the audio frequency of the sound emitted from the sound 

source is f, and an audio frequency of the sound heard at 

the virtual listening point 102 is fl, .this audio, frequency 

10 fl can be represented by Equation (10) . 

[Formula 10] 

/I = / . . . (10) 

v 

As a result, even though the virtual listening 
point 102 is set at any place, the. listener can enjoy the 

15 audio with the stronger reality by changing the audio 
frequency of the audio information that is heard at the 
virtual listening point 102. 

As described above, according to the present 
embodiment, the virtual listening point 102 is decided at 

20 the position different from the basic listening point at 
which the listener listens to the audio of the object 2, 
then a velocity of the virtual listening point 102, which 
is observed from the object 2, is calculated based on the 
position information of the object 2 and the position 

25 information of the .virtual listening point 102 when such 
virtual listening point 102 is moved, and then the audio 
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frequency of the audio that is heard at the virtual 
listening point 102 is changed according to the calculated 
velocity. Therefore, even if the virtual listening point 
102 is moved to any place, the sound field with the reality 
5 can be generated. 

(Fourth Embodiment) 

FIG .6 is a view explaining a fourth embodiment of 

the present . invention . . - . . 

10 As shown above FIG. 5, assume that the virtual 

listening point 102 is moved rightward in the screen. 

Also, assume that the video object 2 having the audio 

information is not moved. Then, assume that coordinates of 

the object 2 are set to (xl, yl, zl) shown in FIG. 5, the 
is virtual listening point 102 has the velocity information 

(including also the direction information) , and the 

velocity is set to VI . 

Then, the cos 0 is calculated by using an angle 6 

between a vector directed from the object 2 to the position 
20 PI and a vector directed from the position PI to the 

position P2, as shown in FIG. 5. Then, a component of the 

velocity VI of the virtual listening point 102 in the 

direction directed from the object 2 to the position PI can 

be represented by Equation (11) . 
25 [Formula 11] 

vr = v\cos e . . . (ii) 
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Here, assume that the velocity of the sound is v, the audio 
frequency of the sound emitted from the sound source is f, 
and the audio frequency of the sound heard at the virtual 
listening point 102 is fl, this audio frequency fl can be 
5 represented by Equation (12) . 
[Formula 12] 

v - VY 

fl = ^—^f ... (12) 
v 

As a result, even though the virtual listening 
point 102 is set. at any place, the listener, can. enjoy. the 

10 audio with the reality by changing the audio ' frequency of 
the audio information that is heard at the virtual 
listening point 102. 

In this manner, according to the present 
embodiment, the virtual listening point 102 is decided at 

is the position different from the basic listening point at 
which the listener listens to the audio of the object 2, 
then the velocity and the moving direction are decided when 
such virtual listening point 102 is moved, then an 
approaching or leaving velocity of the object 2 that is 

20. observed at the virtual listening point 102 is calculated, 
and then the audio frequency of the audio that is heard at 
the virtual listening point 102 is changed according to the 
calculated velocity* Therefore, even through the virtual 
listening point 102 is moved to any place, the sound field 

25 with the reality can be generated. 
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(Fifth Embodiment) 

In the present embodiment, when both the object 1 
having the video information and the audio information and 
the virtual listening point 102 are moved, the audio 
5 frequency of the audio that is heard at the virtual 
listening point 102 is changed. 

Assume that the object 1 having the video 
information and the audio information, as shown in above 
FIG.l, is present.- Also, the moving virtual. . listening. 
10 point 102 shown in FIG. 5 is decided. Then, if the current 
position of the object 1 is set to PI (xa, ya, za) shown in 
FIG. 6, and a position after the time t has lapsed is set to 
P2 (xb, yb, zb) shown in FIG. 6, a vector between them can 
be represented by Equation (13) . 
15 [Formula 13] 

P\P2 = (xb - xa,yb - ya,zb - za) -..(13) 

A velocity of the object 1 is calculated to take 
account of unit of time. If the velocity of the object 1 
is assumed as VI, this velocity VI can be represented by 
20 Equation (14) . 
[Formula 14] 

V\ = k(xb - xa , yb - ya ,zb - za) . . . (14) 
where k is a constant. 

Then, the cos 8 is calculated by using an angle 6 
25 between a vector directed from the position PI to the 
virtual listening point 102 and a vector directed from the 
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position PI to the position P2, as shown in FIG. 6. Then, a 
component of the velocity VI of the object 1 in the 
direction directed from the position PI to the position P2 
can be represented by Equation (15) . 
5 [Formula 15] 

vr = kicos e . • . (is) 

Similarly, if a current position of the virtual 
listening point 102 is set to P3 (xc, yc, zc) shown in 
. . FIG. 6 and a position after the -time t has lapsed is P4 . (xd, 
10 yd, zd) shown in FIG. 6, a vector between them can be 
represented by Equation (16) . 
[Formula 16] 

P3P4 = {xd - xc,yd - yc,zd - zc) ... (16) 

The velocity of the virtual listening point 102 is 
15 calculated with regard to unit of time. If the velocity of 
the virtual listening point 102 is set to V2, this velocity 
V2 can be represented by Equation (17) . 
[Formula 17] 

V2 = k\xd - xc y yd - yc,zd - zc) ... (17) 

20 [0102] 

where k' is a constant. 

Then, a cos 0 2 is calculated by using an angle 0 2 
between a vector directed from the position PI to the 
position P3 and a vector directed from the position P3 to 
25 the position P4, as shown in FIG. 6. Then, a component of 
the velocity V2 in the direction directed from the position 
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PI to the position P3 can be represented by Equation (18) . 
[Formula 18] 

Vr = V 2 cos 02 . . . (18) 

Here, assume that the velocity of the sound is v, the audio 
5 frequency of the sound source is f, and an audio frequency 
of the audio heard at the virtual listening point 102 is 
fl, this audio frequency f-1 can be represented by Equation 
(19). 

. . [Formula 19] 

io fl = v f ...(19) 

Even if the virtual listening point 102 is set at 
any place, the listener can enjoy the audio with the 
stronger reality by changing the audio frequency of the 
audio information, which is heard at the virtual listening 

15 point 102, into f 1 . 

In this manner, according to the present 
embodiment, when both the object 2 and the virtual 
listening point 102 are moved, the velocity of the object 
2, which is observed from the virtual listening point 102, 

20 and the velocity of the virtual listening point 102, which 
is observed from the object 2, are calculated based on the 
position or velocity information and the moving direction 
of the object 2 and the position or velocity information 
and the moving direction of the virtual listening point 

25 102, and then the audio frequency of the audio that is 
heard at the virtual listening point 102 is changed 
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according to the calculated velocities . Therefore, even if 
the virtual listening point 102 is moved to any place, the 
sound field with the reality can be generated. 



5 (Sixth Embodiment) 

FIG. 7 is a view explaining a sixth embodiment of 
the present invention. 

As shown in FIG. 7, a virtual listening point 701 is 
decided. Assume that background data, .-has the. ...audio. 

10 information and the background is moved, and the 
video/audio format has the velocity information or the 
position information. Here, assume that x-y-z axes of a 
screen 801 are set, as shown in FIG. 8, and the background 
is regarded as an object that is positioned at 

is (x, y, z) = (0, 0, t ) . Where t is a constant. Accordingly, the 
audio frequency of the audio that is heard from the virtual 
listening point 701 is produced by executing the process in 
the second embodiment. If the background is regarded as 
the object positioned at a center Pa (0,0, t) and a velocity 

20 of the background is set to VI, a velocity component VI' in 
the direction from the center Pa to the virtual listening 
point 701 can be represented by Equation (20) by using an 
angle 6 shown in FIG. 9. 
[Formula 20] 

25 VV = Flcos 9 . . . (20) 

Here, assume that the velocity of the sound is v, the audio 
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frequency of the sound emitted from the sound source is f, 
and the audio frequency of the sound heard at the virtual 
listening point 701 is fl, this audio frequency fl can be 
represented by Equation (21) . 
[Formula 21] 

Z 1 = — 1777/ • • • (21) 

v - VY 

As a result, even though the virtual listening 
point 701 is set at any place, the listener can enjoy the 
audio with the stronger reality, by changing the audio 
frequency of the audio information that is heard at the 
virtual listening point 701. 

In order to implement the present embodiment, the 
velocity information and the direction information of the 
scene, which were encoded previously by an encoder, or the 
like, must be described in the scene information. For 
example, as shown in FIG. 10, since the velocity information 
and the direction information are included in the 
information at a certain time within the scene information, 
generation of the audio can be realized to take account of 
the Doppler effect. 

In this manner, according to the present 
embodiment, the virtual listening point 701 is decided in 
the screen on which the video information is projected, and 
then the audio frequency of the audio that is heard at the 
virtual listening point 701 is changed based on the moving 
direction and the velocity of the scene with regard to the 
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velocity of the background (regarded as the object), which 
is observed at the virtual listening point 701, and the 
moving velocity of the scene. Therefore, even through the 
virtual listening point 701 is moved to any place, the 
5 sound field with the reality can be generated. 

(Seventh Embodiment ) 

In the present embodiment, the virtual listening 
point 102 shown. . in ..above F.IG.l is used . as . another . obj ect . 

10 In the following, this virtual listening point 102 is 
assumed as an object 3. The position information or 
velocity information and the direction information of the 
object 1 and the object 3 are obtained from the video 
information and the audio information, and then a velocity 

is component in the direction directed from the object 1 to 
the object 3 is calculated. Assume that a velocity 
component of the object 1 in the direction directed from 
the object 1 to the object 3 is VI', a velocity component 
of the object 3 in the direction directed from the object 1 

20 to the object 3 is V2' , the velocity of the sound is v, the 
audio frequency of the sound of the sound source is f, and 
the audio frequency of the sound that is heard at the 
virtual listening point 102 is f 1 . Equation (22) is 
derived by applying these . matters into the equation 

25 indicating the Doppler effect. 
[Formula 22] 
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Even if the virtual listening point 102 is set at 
any place, the listener can enjoy the audio with the 
5 stronger reality by changing the audio frequency of the 
audio information, which is heard from the object 3, into 
fl . 

In this way, according to the present embodiment, 
one certain object 3 is set at the virtual listening, point . . 
10 102, and then the audio frequency of the audio that is 
heard at the set virtual listening point 102 is changed. 
Therefore, even if the virtual listening point 102 is moved 
to any place, the sound field with the reality can be 
generated* 

15 

(Eighth Embodiment) 

In some cases, it is difficult to get the audio, 
from which the Doppler effect can be disregarded, when the 
video information and the audio information are obtained at 

20 the time of actual imaging. Also, in many cases, the 
Doppler effect has already been considered in the audio 
replayed by the current video/audio player such as the DVD 
player, the MPEG 4 player, etc. In the situation that the 
virtual listening point is changed at all places in such 

25 sound field, even if the virtual listening point is changed 
at any place, the present embodiment makes it possible to 
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get the Doppler effect according to such place. 

The MPEG player is produced under the assumption 
that basically the listener listens to the audio at a basic 
listening point 1001 shown in FIG. 11. At that time, assume 
5 the object 1 has audio data, sometimes the audio in which 
the Doppler effect is taken into consideration previously 
as the sound that is to be heard at the basic listening 
point 1001 is recorded. Assume that the object 1 is moving 
at the velocity .VI, .and the audio frequency of. the audio 
10 that is heard at the basic listening point 1001 is f 1 . A 
velocity component VI ' of the object 1 in the direction 
directed from the object 1 to the basic listening point 
1001 is given by Equation (23) . 
[Formula 23] 

is W = Vlcos 9\ ... (23) . 

The audio frequency fl of the audio that is heard 
at the basic listening point 1001 can be represented as 
shown in Equation (24) . 
[Formula 24] 

20 fl = v * yv f ...(24) 

Then, if the audio frequency of the audio 
information of the object 1, in which the Doppler effect is 
disregarded, is assumed as f, such frequency can be 
represented by following Equation (25) . 
25 [Formula 25] 

/ = /I . . . (25) 
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In this manner, if the inverse calculation of the 
Doppler effect is executed, the audio frequency of- the 
audio information, in which the Doppler effect is not taken 
into consideration, can be derived from the audio frequency 
5 of the audio information, in which the Doppler effect is 
taken into consideration . 

Then, when the audio that is heard at a virtual 
listening point 1002 is to be generated, the audio 
frequency- of the audio information, which is . heard-at ..the. 

10 virtual listening point 1002, can be derived from the audio 
frequency of the audio information, in which the Doppler 
effect is not regarded, according to the formulae shown in 
the first, second, third, sixth, and seventh embodiments. 
Here, the audio frequency of the audio information, which - 

15 is to be heard at the virtual listening point 1002, is 
derived under the assumption that the virtual listening 
point 1002 is not moved. 

In FIG. 12, assume that the audio frequency of the 
audio information, which is to be heard at the virtual 

20 listening • point 1002, is set to f2. If a component of the 
velocity VI of the object 1 in the direction directed from 
the object 1 to the virtual listening point 1002 is set to 
V2, such component can be represented by Equation (26) . 
[Formula 2 6] 

25 V2 = Flcos 02 ... (26) 

Thus, Equation (27) is satisfied. 
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[Formula 27] 

/2 = 7rW / ...(27) 
If following Equation (28) is substituted into 
Equation (27) based on the object 1 and the basic listening 
5 point, Equation (29) can be derived. 
[Formula 28] 

fl = v -VY f . • * * (28) 

[Formula 29] 

... (29) 

10 Even though the position of the virtual listening point 

1002 is changed into any place on the coordinate axes, the 

listener can enjoy the audio with the stronger reality by 

adding the appropriate Doppler effect in response to that 

location . 

15 In this fashion, according to the present 

embodiment, if there is the audio information to which the 
Doppler effect obtained when the audio is heard at a 
certain place has already been added, the audio information 
to which the Doppler effect is not applied is generated by 

20 executing the inverse calculation of the Doppler effect. 
Then, when the sound field generated by the virtual 
listening point is to be produced, the Doppler effect is 
added by using -the audio information to which the Doppler 
effect is not applied. Therefore, when a plurality of 

25 sound fields are to be generated from one audio stream, the 
sound fields with the stronger reality can be generated. 
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Also, according to the present embodiment, the 
audio in which the Doppler effect is disregarded can be 
loaded on audio streams of respective objects, and the 
sound fields that are heard just in multiple channels can 
5 be generated from the audio information in one channel, and 
also a size of the audio information can be reduced. 

(Ninth Embodiment) 

In the present, embodiment, velocities of the object 
10 and the virtual listening point are calculated when a next 
image is not present in the final image of the title, for 
example. 

When the velocity cannot be calculated from the 
coordinates of the next image since the next image is not 

is present or since the object or the virtual listening point 
does not have the velocity information at the timing prior 
to one image when the screen is exchanged, it is assumed 
that a time axis is set as shown in FIG. 13 and that the 
audio frequency of the audio, which is heard at the virtual 

20 listening point in the final image unit (the final VOBU, 
the final cell, or the like) , is calculated based on the 
formula that is applied to the audio frequency of the 
audio, which is emitted from the object in the final image 
unit, by using the formula of the audio frequency of the 

25 audio, which is heard at the virtual listening point prior 
to one image unit. The audio frequency of the audio of the 
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object 1, which is heard at the virtual- listening point 102 
shown in FIG. 13, can be represented by Equation (19) shown 
in above fifth embodiment. 
[Formula 30] 

v -V2' 

/' = 7T7F' •••' 19 » 

As a result, if the audio frequency of the audio 
that is emitted from the object 1 in the final image unit 
is assumed as f ' , an audio frequency fl' of the object 1, 
. . which is . heard at the virtual listening . point 102 in the 
10 final image unit, can be represented by following Equation 
(30) . 

[Formula 31] 

/!' = V ~^?, /' . - - (30) 

v - VI 

In this manner, according to the present 
is embodiment, if the position information of the next screen 
cannot be obtained from the final screen unit of the title, 
or the like, the velocity information of the object or the 
velocity information of the virtual listening point is 
obtained from the preceding image, and then the audio 
20 frequency of the audio of the object, which is heard at the 
virtual listening point, is calculated. Therefore, even 
though the virtual listening point is moved to any place, 
the sound field with the reality can be generated. 

25 (Tenth Embodiment) 

In order to calculate the actual velocity from 
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coordinate data on the screen in plural time units, reduced 
scale information of the screen must be provided. Since 
the reduced scale information is different scene by scene, 
such- reduced scale information must be provided every 
5 scene. For this reason, in the present embodiment, as 
shown in FIG. 14, a video/audio format that has reduced 
scale information, which has been encoded previously by the 
encoder, or the like, in the scene information is 
implemented 

10 In this case, the audio information transforming 

methods explained in the ninth embodiment to the tenth 
embodiment are formatted as a program respectively and then 
are recorded in the recording medium such as a memory in 
which a decoder for decoding the video/audio format and a 

is decoding program are recorded, a memory in which a program 
for controlling the decoder is recorded, or the like. As a 
result, the video/audio player (PVD player, LD player, MPEG 
player, system in the movie theater, etc.), which can 
achieve advantages of respective embodiments, can be 

20 implemented. 

An example of an audio information transforming 
device for implementing the embodiments mentioned above is - 
explained as follows by referring to Fig. 15. 
25 In Fig. 15, an audio information transforming 

device includes a video/audio format 1510, a virtual 
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listening point setting section 1520, a relative velocity 
calculating section 1530, and an audio frequency 
transforming section 1540. 

The video/audio format 1510 includes video 
5 information, position information, audio information, 
velocity information, or such in respect to each object on 
a screen. The virtual listening point setting section 1520 
sets the virtual listening point (for example, 101 of FIG. 
1) . The relative velocity calculating . . section 1530 . 

10 calculates the velocity of an object (for example, object 1 
of FIG.l) by comparing a position information of the object 
1 at a certain time and a position information of the 
object 1 after a predetermined time past from the certain 
time, and then, calculates the relative velocity between 

15 the virtual listening point 101 and the object 1, according 
to position information of the virtual listening point 101 
and velocity of the object 1. If the velocity information 
of the object 1 is included in the video/audio format 1510, 
the relative velocity calculating section 1530 extracts the 

20 velocity information of the object 1 from the video/audio 
format 1510 instead of calculating the velocity of object 
1. 

Then, the audio frequency transforming device 1540 
changes the audio information of the virtual listening 
25 point 101 based on the obtained relative velocity. 

. If the virtual listening point setting section 1520 
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sets the point 102 (moving object 3) of FIG. 1 as a virtual 
listening point and the object 1 of FIG. 1 is considered as 
a sound source, the relative velocity calculating section 
1530 calculates both the velocities of the virtual 
s listening point 102 and the object 1, or extracts the 
velocity information of the virtual listening point 102 and 
the object 1. Then, the relative velocity between the 
moving object 1 and the moving virtual listening point 102 
is calculated .by the relative velocity calculating . -section 

10 1530 based on the obtained velocities. According to the 
calculated relative velocity, the audio frequency 
transforming section 1540 changes the audio information of 
the virtual listening point 102. 

If only velocity information of the object 1 is 

15 included in the video/audio format 1510, the relative 
velocity calculating section 1530 calculates the velocity 
of the virtual listening point 102 by comparing the 
position information of the virtual listening point 102 at 
a certain time and after a predetermined time has lapsed, 

20 and extracts the velocity information of object 1 from the 
video/audio format 1510. 

If only velocity information of virtual listening 
point is included in the video/audio format 1510, the 
relative velocity calculating section 1530 calculates the 

25 velocity of the object 1 by comparing the position 
information of the object 1 at a certain time and after a 
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predetermined time has lapsed, and extracts the velocity 
information of the virtual listening point 102 from the 
video/audio format 1510. 

Moreover, if the background is moving and has audio 
information, it is possible to consider the moving 
background as a moving object which is a sound source. In 
this case, it is possible to set another moving object as a 
virtual listening point. 

Advantages of the Invention 

As described in detail as above, according to the 
audio information transforming method set forth in Claim 1, 
with respect to the object having the video/audio 
information constituting the scene that is replayed on the 
screen in the video/audio format such as MPEG 4, for 
example, the Doppler effect can be added to the audio 
information at the virtual listening point such that, for 
example, the frequency of the sound is increased if the 
object approaches the virtual listening point or the 
frequency of the sound is decreased if the object leaves 
the virtual listening point. Therefore, the audio 

environment with the strong appeal/ reality, which enables 
the listener to feel that such listener just enters into 
the video (the virtual listening point), can be produced. 

According to the audio information transforming 
method set forth in Claim 2, the Doppler effect caused by 
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the movement of the object can be calculated/processed 
easily by using the coded position information of the 
object. Therefore, the audio environment with the 

appeal/reality, which enables the listener to grasp such a 
5 situation that the object in the screen is moving from the 
virtual listening point by the audio, can be produced* 

According to the audio information transforming 
method set forth in Claim 3, there is no necessity to 
calculate the velocity of the object by the operation, . and 
10 the burden of the calculating process can be reduced 
correspondingly. In addition, the processing speed can be 
improved. 

According to the audio information transforming 
method set forth in Claim 4, the Doppler effect caused by 

15 the movement of the virtual listening point can be 
calculated/processed easily by using the position, 
information of the virtual listening point . Therefore, the 
audio environment with the appeal/reality, which enables 
the listener to grasp such a situation that the listener 

20 himself or herself (positioned at the virtual listening 
point) is moving by the audio, can be produced. 

According to the audio information transforming 
method set forth in Claim 5, there is no necessity to 
calculate the velocity of the virtual listening point by 

25 the operation, and the burden of the calculating process 
can be reduced correspondingly. In addition, the 
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processing speed can be improved. 

According to the audio information transforming 
method set forth in Claim 6, with respect to the scene that 
is replayed on the screen in the video/audio format such as 
5 DVD, for example, the Doppler effect is added to the audio 
information at the virtual listening point in response to 
the moving speed of the background. Therefore, the audio 
environment with the strong appeal/reality, which enables 
the listener to feel, that such listener j.ust . enters .into 

10 the video (the virtual listening point) and to grasp such a 
situation that the background of the screen is moving from 
the virtual listening point by the audio, can be produced. 

According to the audio information transforming 
method set forth in Claim 7, in the case that the audio 

15 information including the Doppler effect previously is 
included in the object, first such Doppler effect included 
in the audio information is canceled, and then the Doppler 
effect is added to the audio information at the virtual 
listening point. Therefore, even if the Doppler effect is 

20 included in the audio information prior to the 
transformation, the Doppler effect caused when the object 
in the screen moves from the virtual listening point can be 
expressed precisely. 

According to the audio information transforming 

25 method set forth in Claim 8, in the case that the position 
information of the succeeding screen cannot be obtained at 
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the time of the final image of the title that is now being 
replayed, for example, the audio frequency of the object, 
which is heard at the virtual listening point, can be 
calculated by using the formula of the audio frequency 
5 transformation that is obtained in audio frequency 
transformation processing in the preceding image of the 
final image. Therefore, such a possibility can be 
eliminated that the audio frequency transformation cannot 
be executed, in the final image- of the title, or the. like 

10 because of lack of information. 

According to the audio information transforming 
method set forth in Claim 9, when the reduced scale of the 
screen is changed by zoom-in, zoom-out, or the like of the 
replayed screen, the audio information transformation set 

is forth in Claims 1 to 8 can be executed precisely. 

According to the video/audio format set forth in 
Claim 10, the velocity information of the object, the 
velocity information and the direction information of the 
scene, and the reduced scale information of the screen 

20 every scene are encoded by the encoder set forth in Claim 
11, and then these information are included in the 
video/audio format. Therefore, the audio information 
transformation set forth in any one of Claims 1 to 9 can be 
implemented . 

25 According to the audio information transforming 

program set forth in Claim 12, with respect to the object 



52 



having the video/audio information constituting the scene 
that is replayed on the screen in the video/audio format 
such as MPEG 4, for example, the Doppler effect can be 
added to the audio information at the virtual listening 
5 point such that, for example, the frequency of the sound is 
increased if the object approaches the virtual listening 
point or the frequency of the sound is decreased if the 
object leaves the virtual listening point. Therefore, if 
the recording medium (the memory such as ROM, .or -the like) . 

10 in which this program is recorded is employed, the 
video/audio player (DVD player, LD player, game, MPEG 
player, system in the movie theater, etc.) that can produce 
the audio environment with the appeal/reality, which 
permits the listener to feel that such listener just enters 

15 into the video (the virtual listening point) , can be 
implemented. 

According to the audio information transforming 
program set forth in Claim 13, the* Doppler effect caused by 
the movement of the object can be calculated/processed 

20 easily by using the coded position information of the 
object. Therefore, if the recording medium (the memory 
such as ROM, or the like) in which this program is recorded 
is employed, the video/audio player (DVD player, LD player, 
game, MPEG player, system in the movie theater, etc.) that 

25 can produce the audio environment with the appeal/reality, 
which enables the listener to grasp such a situation that 
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the object in the screen is moving from the virtual 
listening point by the audio, can be implemented. 

According to the audio information transforming 
program set forth in Claim 14, there is no necessity to 
5 calculate the velocity of the object by the operation, and 
the burden of the calculating process can be reduced 
correspondingly, and in addition the processing speed can 
be improved. Therefore, if the recording medium (the 
memory such as ROM,. -or the like) in ..which this, program . is 

10 recorded is employed, the video/audio player (DVD player, 
LD player, game, MPEG player, system in the movie theater, 
etc.) that can produce the audio environment with the 
appeal/reality, which enables the listener to grasp such a 
situation that the object in the screen is moving from the 

15 virtual listening point by the audio, can be implemented. 

According to the audio information transforming 
program set forth in Claim 15, the Doppler effect caused by 
the movement of the virtual ' listening point can be 
calculated/processed easily by using the position 

20 information of the virtual listening point. Therefore, if 
the recording medium (the memory such as ROM, or the like) 
in which this program is recorded is employed, the 
video/audio player (DVD player, LD player, game, MPEG 
player, system in the movie theater, etc.) that can produce 

25 the audio environment with the appeal/reality, which 
enables the listener to grasp such a situation that the 
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listener himself or herself (positioned at the virtual 
listening point) is moving by the audio, can be 
implemented . 

According to the audio information transforming 
5 program set forth in Claim 16, there is no necessity to 
calculate the velocity of the virtual listening point by 
the operation, and the burden of the calculating process 
can be reduced correspondingly, and in addition the 
processing speed can., be improved Therefore,. if the 

10 recording medium (the memory such as ROM, or the like) in 
which this program is recorded is employed, the video/audio 
player (DVD player, LD player, game, MPEG player, system in 
the movie theater, etc.) that can produce the audio 
environment with the appeal/reality, which enables the 

15 listener to grasp such a situation that the listener 
himself or herself is moving by the audio, can be 
implemented. 

According to the audio information transforming 
program set forth in Claim 17, with respect to the scene 

20 that is replayed on the screen in the video/audio format 
such as DVD, for example, the Doppler effect is added to 
the audio information at the virtual listening point in 
response to the moving speed of the background. Therefore, 
if the recording medium (the memory such as ROM, or the 

25 like) in which this program is recorded is employed, the 
video/audio player (DVD player, LD player, game, MPEG 
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player, system in the movie theater, etc.), which can 
produce the audio ' environment with the strong 
appeal/reality, can be implemented. 

According to the audio information transforming 
5 program set forth in Claim 18, even if the Doppler effect 
is included in the audio information prior to the 
transformation, the Doppler effect caused when the object 
in the screen moves from the virtual listening point can be 

. . expressed precisely. . . Therefore,, if the recording medium 

10 (the memory such as ROM, or the like) in which this program 
is recorded is employed, the video/audio player (DVD 
player, LD player, game, MPEG player, system in the movie 
theater, etc.), which can produce the audio environment 
with the strong appeal/reality, can be implemented. 

15 According to the audio information transforming 

program set forth in Claim 19, in the case that the 
position information of the succeeding screen cannot be 
obtained at the time of the final image of the title that 
is now being replayed, for example, the audio frequency. of 

20 the object, which is heard at the virtual listening point, 
can be calculated by using the formula of the audio 
frequency transformation that is obtained in audio 
frequency transformation processing in the preceding image 
of the final image. Therefore, such a possibility can be 

25 eliminated that the audio frequency transformation cannot 
be executed in the final image of the title, or the like 
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because of lack of information. As a result, if the 
recording medium (the memory such as ROM, or the like) in 
which this program is recorded is employed, the video/audio 
player (DVD player, LD player, game, MPEG player, system in 
5 the movie theater, etc.), which can produce the audio 
environment with the strong appeal/reality, can be 
implemented. 

According to the audio information transforming 
. program set. forth in. Claim 20, when- the . reduced scale, of. 

10 the screen is changed by zoom-in, zoom-out, or the like of 
the replayed screen, the audio information transformation 
can be executed precisely. Therefore, if the recording 
medium (the memory such as ROM, or the like) in which this 
program is recorded is employed, the video/audio player 

is (DVD player, LD player, game, MPEG player, system in the 
movie theater, etc.), which can produce the audio 
environment with the strong appeal/reality, can be 
implemented. 

According to the audio information transforming 
20 device set forth in Claim 21, with respect to the object 
having the video/audio information constituting the scene 
that is replayed on the screen in the video/audio format 
such as MPEG 4, for example, the Doppler effect can be 
added to the audio information at the virtual listening 
25 point such that, for example, the frequency of the sound is 
increased if the object approaches the virtual listening 
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point or the frequency of the sound is decreased if the 
object leaves the virtual listening point. Therefore, if 
this audio information transforming device is employed, the 
audio environment with the strong appeal/reality, which 
5 enables the listener to feel that such listener just enters 
into the video (the virtual listening point) , can be 
produced. 

According to the audio information transforming 
device set forth, in Claim 22, the audio . environment with 

10 the appeal/reality, which enables the listener to feel that 
such listener just enters into the video (the virtual 
listening point) and to grasp such a situation that the 
object in the screen is moving from the virtual listening 
point by the audio or to grasp such a situation that the 

15 listener himself or herself is moving by the audio, can be 
produced. 

According to the audio information transforming 
device set forth in Claim 23, the audio environment with 
the appeal/reality, which enables the listener to feel that 

20 such listener just enters into the video (the virtual 
listening point) and to grasp such a situation, that the 
object in the screen is moving from the virtual listening 
point by the audio, can be produced. 

According to the audio information transforming 

25 device set forth in Claim 24, the audio environment with 
the appeal/reality, which enables the listener to feel that 
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such listener just enters into the video (the virtual 
listening point) and to grasp such a situation that the 
listener himself or herself (positioned at the virtual 
listening point) is moving by the audio, can be produced. 
5 According to the audio information transforming 

device set forth in Claim 25, with respect to the scene 
that is replayed on the screen in the video/audio format 
such as DVD, for example, the Doppler effect is added to 
the audio information, at the virtual listening . point .in 

10 response to the moving speed of the background. Therefore, 
the audio environment with the appeal/reality, which 
enables the listener to feel that such listener just enters 
into the video (the virtual listening point) and to grasp 
such a situation that the background of the screen is 

15 moving from the virtual listening point by the audio, can 
be produced. 
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